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(57) ABSTRACT 

A software layer (filter driver) residing between software 
components or application programs running locally or on a 
client across a network and a persistent store of an operating 
system provides on-the-fly conversions of persistent infor- 
mation formats. The filter driver determines which format a 
program expects, and dynamically converts the information 
from its storage format to the format expected by the 
program. Conversion includes both data format conversion, 
and conversion of access semantics. Loadable conversion 
modules are provided for. converting application specific 
formats due to the potential large number of such formats 
which can be encountered. The filter driver may change the 
format that information is stored in based on access history 
or other system requirements. The software components or 
application programs may be ignorant of the true storage 
format iised by the system, and thus the filter driver can be 
used to transparently give old versions of software access to 
information stored in newer formats. 

28 Claims, 5 Drawing Sheets 



APPUCATION - 



^210 



SYSTEM LIBRARIES AND ' 
COMPONEffTS 



^212 



i/0 MANAGER - 



..214 



21 6 DYNAMIC CONVERSION 
niTER DRIVER — 
I 

218^ REMOTE RLESYSTEM CUENT 



LOADABLE 
-CONVERSION 
MODULES 



220^ 



I 

• NETWORK 



• 224 

! 

NETWORK 



REMOTE 
FILESYSTEM ■ 
SERVER 



230^ 



226 



APPLlCATtON 
I 

SYSTEM LIBRARIES AND 
COMPONENTS 

I 

I/O MANAGER 



236 



222 

/ 



.^240 



226 



232 

DYMAMIC CONVERSION LOADABLE 

RLTER DRIVER CONVERSION 

I MODULES 



234^ NATIVE RLESYSTEM 



236^VIRrUAL/PHy51CAL DISKS 



05/20/2003, EAST Version: 1.03.0002 



U.S. Patent 



Apr. 15, 2003 



Sheet 1 of 5 



US 6,549,918 Bl 




05/20/2003, EAST Version: 1.03.0002 



U.S. Patent Apr. is, 2003 Sheet 2 of 5 US 6,549,918 Bl 



APPUCATION 

SYSTEM LIBRARIES AND^212 
COMPONENTS 



I/O MANAGER ^214 



208, 



216^ DYNAMIC CONVERSION LOADABLE 

FILTER DRIVER CONVERSION 

I MODULES 

218^ REMOTE RLESYSTEM CLIENT 
220-^^ NETWORK 



224 

NETWORK 



REMOTE 

FILESYSTEM 

SERVER 
^ 230- 

226 



222 



APPLICATION 



238 



/ 



SYSTEM LIBRARIES AND 
COMPONENTS 



240 



I/O MANAGER 



DYNAMIC CONVERSION 
FILTER DRIVER — 



'228 232 

r' 

LOADABLE 
CONVERSION 
MODULES 



234^ NATIVE FILESYSTEM 



236^V1RTUAL/PHYSICAL DISKS 



FIG. 2 



05/20/2003, EAST Version: 1.03.0002 



U.S. Patent Apr. 15, 2003 sheet 3 of 5 US 6,549,918 Bl 



WORD 
0LE32 

"TI 



^312 



-314 



W1ND0WS95 
(NETWORK) 
WINDOWSNT 5.0 



316 



FORMAT-SPECinC 
METADATA 



USER DATA 
VIRTUAL STREAM 



USER DATA 
VIRTUAL STREAM 




USER DATA 
VIRTUAL STREAM 




EOF 



332 



DYNAMIC STORAGE 
CONVERSION RLTER 



-330 




-T 
334 



-7 
336 



338 



II 

o 

< °- 



340 



FIG. 3 



05/20/2003, EAST Version: 1.03.0002 



U.S. Patent Apr. 15, 2003 Sheet 4 of 5 



US 6,549,918 Bl 



( startV^io 



RETRIEVE NSS RLE 
CHECK CACHE 



I 



^412 



GENERATE DOCRLE 
ALLOCATION STRUCTURES 



-414 



CACHE SYNTHESIZED 
DATA 



-416 



PROVIDE 
DOCRLE EMULATION TO 
UPPER SOFTWARE LAYERS 



-417 



RECONVERT DOCVIEW TO 
NSS FORMAT WHEN 
DOCVIEW CLOSED 



-418 



C END > ^20 



FIG. 4 



05/20/2003, EAST Version: 1.03.0002 



U.S. Patent Apr. is, 2003 Sheet 5 of 5 



US 6,549,918 Bl 



MONITOR ACCESS 
REQUESTS 



-510 



MAINTAIN STATISTICS OF 
DESIRED FORMATS FOR 1-520 
EACH RLE 





frORED FORMAT^^S^?. 

BEST FORMAL 
9 

YES 



530 

^ 

STORE RLE 
IN DESIRED 
FORMAT 



FIG. 5 



05/20/2003, EAST Version: 1.03.0002 



us 6,549,918 Bl 
1 2 

DYNAMIC INFORMATION FORMAT application may use an information format that was xaveated 

CONVERSION ^ft^r earlier system was developed. These difiSculties 

also arise with different applications that use a common type 
of information, but expect different formats, such as image 
FIELD OF THE INVENTION 5 processing applications that use JPEG instead of GIF, or 

document processors which use HTML instead of Word? 
This invention relates generaUy to the field of processmg ^Q^^^t Incompatibilities can also be due to the file systems 
computer information formats and more particularly to a ^^^^^j. persistent stores used by different operating sys- 

method and system for dynamically accessing information ^^^^ Qne type of operating system has file servers that store 
in a format different than the format used by the computer j^ta files formatted as a single stream. Applications interface 
system to internally represent the information. ^jth the file server via an interface, such as OLE32, and 

roPYRTHHT NOTTPF/PFRMISSION expected the data to be returned to it in a certain format. 

COPYRIGHT NOTICE/PERMISSION ^^^^ was specifically designed to retrieve and transfer 

A portion of the disclosure of this patent document data in the single stream format of docfiles. A newer or 

contains material which is subject to copyright protection. different type of file format may use the same set of 

The copyright owner has no objection to the facsimile interfaces, but store the information in a different format, 

reproduction by anyone of the patent document or the patent perhaps relying on a file system format that supports mul- 

disclosure as it appears in the Patent and Trademark Office tiple streams in a single storage container, and this results in 

patent file or records, but otherwise reserves all copyright a compatibility problem. 

rights whatsoever. The following notice applies to the soft- ^ Prior attempts to solve the problem of using different 

ware and data as described below and in the drawing hereto: versions of applications and different applications storing 

Copyright© 1998, Microsoft Corporation, All Rights data in different formats involved the use of conversion 

Reserved. programs which performed expKcit conversions on infor- 
mation between formats. Thus, when opening a document, a 

BACKGROUND ^ would be presented with a choice of converting a 

Computer applications such as document processors, data document to a new format prior to opening it. Also, on 
base programs, simulators, games, editors, and compilers all storing out a document, a user may select many different 
need to persist information even while the application is not apphcation level formats in which to store it. These solutions 
running. Computer systems store persistent information in a worked well for new versions of software, where the support 
variety of ways, including disk-based file systems, data 30 forsuchconversions was built into the programs, but did not 
bases, hierarchical storage systems, internet servers, and work well when an older version of software was confronted 
distributed memory. Persistent application data is stored in with a data format produced by a newer version. If a user of 
different formats depending on the type of application, and the new version failed to explicitly save the information in 
even depending on the version of a single application. The a format that was understood by earlier systems, the infor- 
format of the information is what gives meaning to the 35 mation would be unavailable to users on earlier systems, 
binary bits which are used to.represent the information. The Either the earlier system must be upgraded with a new 
format includes both the explicit details of how to interpret program to convert the data, or the newer program must be 
the bits, as well as the rules that are to be observed when started again and the file converted prior to trying to use the 
accessing the information such as how to concurrently older version to work with it. This was an unsatisfactory 
access the data from multiple users, how to sequence modi- 40 solution because the older application or system would not 
fications to the information so that it can be recovered after understand that the information was in a newer format, and 
a system crash, or how to maintain auxiliary data used to give the user confusing enror messages. Even where the 
manage the information for purposes such as workflow format problem could be detected, there were generally no 
management, auditing, or security. Multiple formats can be tools available on the older system to effect the conversion, 
applied to the same information. The persistent storage that 45 ^h^ problem is also common on computers coupled by 
holds the information produced by an application is some- network, where a file server, remote database or other 
times referred to as a file. The computers on which such distributed persistent storage mechanism may store data in a 
applications run have file systems and other persistent stores newer file system format, or there may be multiple versions 
which store the files out onto memory devices in yet fiirther of the same software on different machines, and one user 
formats. These multiple different formats, both at the appli- 50 ^^^^ riot have access to newer versions in order to appro- 
cation level and at the file server level lead to difficult prialely transform application information fonmats. 
interoperabihty problems. For example, a document pro- Some image processing applications keep an image file in 
duccd by a later version of a document processor is often not an internal compressed format, and then use an operating 
readable by a previous version of the document processor. system driver to transform the file to appear to be in a fixed 
When a user buys a new computer loaded with the latest 55 setof well-know image formats (JPEG, GIF, etc). It does not 
software, produces a document, and gives a copy of the allow modifications to the well-known formats, and is only 
document to someone else only having a previous version of involved in data format conversion, 
the software, the copy can be useless and indecipherable by Such solutions also fail to provide more than data format 
the previous version. conversion. The *how to' rules associated with the format are 

Further difBculties arise when a user desires to share 60 not implemented, so users cannot share or manage the 

documents and other files over a network with a person information. This type of format conversion produces a copy 

using a different operating system, or apphcation, or even a of the infomiation in the old format, which can be accessed 

different version of the same operating system or apphca- or modified independently of the original, producing incon- 

tion. If the different systems use different formats for the sistencies between the separately stored versions of the 

information, due to changes in the applications, or internal 65 information. 

operating system components, they may have difficulty There is a need for an easier and more convenient way to 

sharing information. In particular the newer system or provide interoperability between different versions of appU- 
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cations and operating system persistent storage systems. network. An indication that an application requires a data 

There is a need for such a way which does not require format transformation is provided to the filter driver by 

modifications to the applications, and that is backward cither the application specifying the desired format, or it is 

compatible with existing applications. The provision of such deduced from information such as the version of the system 

interoperabiUty should be transparent to a user and should 5 or application opening the file. If no indication of the desired 

also be provided in an cfiBcient manner. Further, it should ^rmat is provided, an older version of the application is 

allow persistent application information to be dynamically assumed which requires the informaUon to be m an older, 

shared and managed according to the rules of the newer well-known format. The stored form of the information may 

format, rather than requiring users of older software to only converted to an intermediate format which is maintamed 

make a copy of the information in an older format. lo ^l^er driver to handle semantic differences. The 

intermediate format may include cached information in 

SUMMARY OF THE INVENTION order to improve performance and avoid having to convert 

files with each access. The filter driver may also keep a file 

An operating system layer resides between software com- different formats depending on access history or other 

ponents or application programs that expect information to system requirements. 

be in one format and a persistent store manager of the ^^^^^^.^ ^^^^^ ^^^^^ informaUon 

operating system which maintains the information in its a^/essed. The statistics are used to estimate 

persistent form^ The operating system layer, which is ^^^^^^ ^^^^ ^ conversion from the various 

referred to as a filter dnver, provides on-the-fly conversion ^y,^^^^,^^^^ f,, ^^e actual storage format. If it is estimated 

between the file format expected by he apphcationkyer and ^^^^ ^^^^^^^ measured in cpu cycles, memory 

the format used by the persistent store manager. The filter , i,,^,^^,^ ^^.^isk storage size, and similar resource 

dnver determines which format a program expecU, and ^^t^^s, will be less if a new stored format is used, the stored 

dynamically converts the mformation to such a format, ^^^^ information can be translated to a new format 

including both the static layout of the binary data as well as ^^^^^^^ -^^^ ^ ^^^^^^^^^ ^^^^ ^ ^ 

the dynamic rules for how to access the data. ^ weekend hours. 

Computer programs access persistent information by ^^^^ ^^^^^ ^^^^^ appHcations to open files in the 

invoking Application Programing Interfaces (APIs) which ^^^^^^^ ^^^^ ^^^^ ^^^^^^ ^^^^ ^^^^^^ underlying file 

make copies of the information m the persistent store ^^^^^^ ^^^^ ^^^^^ ^ different If the file's true 

available in the program's memory, and also update the ^^^^^^ expected format are compatible, the filter driver 

persistent store with any desired changes. In addition to the ^ ^u^^^ ^^^^ succeed directly, bypassing the filter 

static binary data portion of the information, there is auxil- ^^^^^ jf ^^^^^^^ incompatible, then when the 

iary information regarding aspects such as dates, secunty, application reads and writes the file, the filter driver causes 

amount of information available, and other properties. This ^j^^ ^^^^^^ expected format. Semandc infor- 

auxiUary information is sometimes called *meta-data.' The ^^^.^^ regarding concurrent access between appHcations is 

filter driver dynamically converts between formats by copy- 3^ ^j^^ translated. Auxiliary information having implied 

ing information between the persistent store and the appli- semantics such as access control lists, management 

cation^s memory accordmg to a conversion algorithm, pro- information, property sets, alternate representations, cached 

viding the appUcation with a 'view^ of the file that is information, annotations, audit trails and other similar infor- 

different from the view offered by the underlying storage maintained and may be cached for faster 

system. The 'converted view' provided by the filter dnver access 

does not necessarily mean that all the data and meta-data of ^^^^^^ ^^^^^^ .^^^^^^^ ^^^^ ^ 

the file has been converted. The requirement is only that the ^ ^ ^^^^ ^^^^ ^.^ ^ ^ 

data that is copied into the application s memory appears to ^^^^^^ ^^^^^^^ ^^^^^^ ^^^-^^ ^ ^^^^ ^^^^ 

have been converted. ^^^^ system is converted at the same time. This makes 

Both file system formats and appUcation program specific 45 upgrades easier to perform, and also allows upgrades to take 

formats are convertible by the filter driver. This allows pj^^^ ^^^^^^ ^^^^^ 5^ ^^^y important for organiza- 

applications and other programs to operate transparently ^^^^ ^^jj^ j^^g^ numbers of systems. Applications can also 

with different file systems and older versions of applications ^^^^ed files in a new context, such as in emails or copying 

without modification. In one instance of the invention, ^f^^^^^ ^^^^^^^ ^^^^^^ specific fonnats are required, 

separate loadable conversion modules are provided for con- 50 ^^^^ f^Yi&r driver resides in or near the kernel, overhead 

verting application specific formats due to the potential large conversions are low, and conversion is transparent to 

number of such fonnats which can be encountered. ^j^^ applications. Further, when converting back to an older 

Loadable conversion modules are provided as either parts format the filter driver can choose a more efficient repre- 

of the operating system or as parts of distinct applications. sentation of the information in the older format based on 

For example two versions of a word processor application 55 information in the newer format, such as in WindowsNT 5.0 

might run on the same system, with the newer one storing where NSS to docfile conversion results in contiguous file 

documents in a different format. The newer version of the allocation tables. 

application could provide a conversion module for use by DESCRIPTION OF THE DRAWINGS 

the filter driver to allow files created by the new appUcation ^^^^^ DESCRIPTION OF IHE DRAWINGS 

to be accessed by the old application. eo ^IG. 1 is a block diagram of a hardware and operating 

The filter driver may reside in the kernel of an operating environment of the present invention, 

system of a computer system. AppUcations may be operating FIG. 2 is a high level block diagram showing the rela- 

directly on the computer system, or may be networked to the tionship between a filter driver of the present invention and 

computer system. In either event, the filter driver sits above other operating environment programs, 

a persistent store, such as a file system and intercepts 65 FIG. 3 is a block diagram of the conversion between NSS 

requests for stored information coming from cither the local storage formats used in WindowsNT 5.0, and a docfile 

Application Programming Interfaces (APIs) or across the format expected by Windows95. 
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FIG. 4 is a flowchart of dynamic conversion between the computer 20 further includes hard disk drive 27 for reading 

NSS and doclile formats performed by the filter driver. from and writing to a hard disk (not shown), magnetic disk 

FIG. 5 is a flowchart of filler driver functions involved in drive 28 for reading from and writing to a removable 

tracking access and selecting formats for storage. magnetic disk 29, and optica disk drive 30 for reading frotn 

^ ^ ^ 5 and wnting to a removable opucal disk 31 such as a 

DETAILED DESCRIPTION CD-ROM or other optical medium. Hard disk drive 27, 

magnetic disk drive 28, and optical disk drive 30 are 

In the following detailed description of exemplary connected to system bus 23 by a hard-disk drive interface 

embodiments of the invention, reference is made to the 32^ a magnetic-disk drive interface 33, and an optical -drive 

accompanying drawings which form a part hereof, and in interface 34, respectively. The drives and their associated 

which is shown by way of illustration specific exemplary computer-readable media provide nonvolatile storage of 

embodiments in which the invention may be practiced. computer-readable instructions, data structures, program 

These embodiments are described in suflBcient detail to modules and other data for personal computer 20. Although 

enable those skilled in the art to practice the invention, and j^e exemplary environment described herein employs a hard 

it is to be understood that other embodiments may be utilized jisk, a removable magnetic disk 29 and a removable optical 

and that logical, mechanical, electrical and other changes ^[^)^ those skilled in the art will appreciate that other 

may be made without departing from the spirit or scope of ^y^^^ of computer-readable media which can store data 

the present invention. The following detailed description is, accessible by a computer may also be used in the exemplary 

therefore, not to be taken in a limiting sense, and the scope operating environment Such media may include magnetic 

of the present invention is defined only by the appended cassettes, flash-memory cards, digital versatile disks, Ber- 

claims. noulli cartridges, RAMs, ROMs, tape archive systems, 

The detailed description is divided into multiple sections. RAID disk arrays, network-based stores and the like. 

In the first section, the hardware and the operating environ- Program modules may be stored on the hard disk, mag- 

ment in conjunction with which embodiments of the inven- netic disk 29, optical disk 31, ROM 24 and RAM 25. 

tion may be practiced are described. In the second section, ^ Program modules may include operating system 35, one or 

the environment and operation of a filter driver for convert- more application programs 36, other program modules 37, 

ing between selected formats is discussed. In the third and program data 38. A user may enter commands and 

section, different additional functions relating to the filter information into personal computer 20 through input devices 

driver are discussed, followed by a conclusion which states such as a keyboard 40 and a pointing device 42. Other input 

some of the potential benefits and describes further altema- devices (not shown) may include a microphone, joystick, 

tive embodiments. game pad, satellite dish, scanner, or the like. These and other 

input devices are often connected to the processing unit 21 

Hardware and Operating Environment through a serial-port interface 46 coupled to system bus 23; 

FIG. 1 provides a brief, general description of a suitable but they may be connected through other interfaces not 

computing environment in which the invention may be 35 shown in FIG. 1, such as a parallel port, a game port, or a 

implemented. The invention will hereinafter be described in universal serial bus (USB). A monitor 47 or other display 

the general context of computer-executable program mod- device also connects to system bus 23 via an interface such 

ules containing instructions executed by a personal com- as a video adapter 48. A video camera or other video source 

puter (PC). Program modules include routines, programs, is represented at 60 as being coupled to video adapter 48 for 

objects, components, data structures, libraries, etc. that per- 40 providing video images for video conferencing and other 

' form particular tasks or implement particular abstract data applications, which may be processed and further Iransmit- 

types. Those skilled in the art will appreciate that the ted by personal computer 20. In further embodiments, a 

invention may be practiced with other computer-system separate video card may be provided for accepting signals 

configurations, including hand -held devices, multiprocessor from multiple devices 60, including satellite broadcast 

systems, microprocessor-based programmable consumer 45 encoded images. In addition to the monitor, personal com- 

electronics, network PCs, minicomputers, desktop puters typically include other peripheral output devices (not 

computers, engineering workstations, mainframe shown) such as speakers and printers, 

computers, and the like. The invention may also be practiced Personal computer 20 may operate in a networked envi- 

in distributed computing environments where tasks are ronment using logical connections to one or more remote 

performed by remote processing devices linked through a 50 computers such as remote computer 49. Remote computer 

communications network In a distributed computing 49 may be another personal computer, a server, a router, a 

environment, program modules maybe located in both local network PC, a peer device, or other common network node, 

and remote memory storage devices, and some functions It typically includes many or all of the components 

may be provided by multiple systems working together described above in connection with personal computer 20; 

FIG. 1 employs a general-purpose computing device in 55 however, only a storage device 50 is illustrated in FIG. 1. 
the form of a conventional personal computer 20, which The logical connections depicted in FIG. 1 include local- 
includes processing imit 21, system memory 22, and system area network (LAN) 51 and a wide-area network (WAN) 52. 
bus 23 that couples the system memory and other system Such networking environments are commonplace in offices, 
components to processing unit 21. System bus 23 may be enterprise- wide computer networks, intranets and the Inter- 
any of several types, including a memory bus or memory 60 net. 

controller, a peripheral bus, and a local bus, and may use any When placed in a LAN networking environment, PC 20 

of a variety of bus structures. System memory 22 includes connects to local network 51 through a network interface or 

read-only memory (ROM) 24 and random-access memory adapter 53. When used in a WAN networking environment 

(RAM) 25. A basic input/output system (BIOS) 26, stored in such as the Internet, PC 20 typically includes modem 54 or 

ROM 24, contains the basic routines that transfer informa- 65 other means for establishing communications over network 

tion between components of personal computer 20. BIOS 24 52. Modem 54 may be internal or external to PC 20, and 

also contains start-up routines for the system. Personal connects to system bus 23 via serial-port interface 46. In a 
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networked environment, program modules, such as those 5.0, and supports multiple formats for document files. Docu- 

comprising Microsoft® Word which are depicted as residing ments stored on the WindowsNT FAT file system can be t 

within 20 or portions thereof may be stored in remote stored in the docfile format. Documents stored on the NT file / 

storage device 50. Of course, the network connections system NTFS) can be stored in either docfile or a native f 

shown are illustrative, and other means of establishing a 5 structm^srorage (NSS). Docfile format is also available on \ 

communications link between the computers may be sub- previous Microsoft Windows® systems, but NSS format is ^ 

stituted. available only on WindowsNT 5.0 or later. The NTFS file 

Software may be designed using many different methods, system stores data in a multi -stream format, with the dif- j 

including C, assembler, VisualBasic, scripting languages ferentstreamsrepresentingdifferenttypesor formats of data 

such as PERL or TCL, and object oriented programming -^g in a document, such as text, graph and spreadsheet. Appli- 

methods. C++ and Java are two examples of common object cations written with the NSS format, utilize the same set of ( 

oriented computer programming languages that provide interfaces which are used for docfiles, but the information is ^ 

functionality associated with object oriented programming. stored in the NSS format. When information stored in NSS . 

An interface is a group of related functions that are format is transferred to other systems, or even to file systemsj 

organized into a named unit. Each interface may be uniquely 15 on the same system which do not support the multi-streams 

identified by some identifier. Interfaces have no of NITS, there is a compatibility problem, 

instantiation, that is, an interface is a definition only without An application 238 residing on the server system 222 

the executable code needed to implement the methods which utilizes system libraries and components 240 and accesses 

are specified by the interface. An object may support an data stored on the server through I/O manager 228 as well, 

interface by providing executable code for the methods 20 Th^ ^^o fiUer drivers 216 and 230 can be stacked across the 

specified by the interface. The executable code supplied by network, and the conversion modules within a particular 

the object must comply with the definitions specified by the system can also be stacked. Thus, if there is a conversion 

interface. The object may also provide additional methods. provided between formats A and B on the client side, and 

Those skilled in the art will recognize that interfaces are not between B and C on the server side, they can be stacked to 

limited to use in or by an object oriented programming 25 get conversion between A and C. It is generally best to avoid 

environment. stacking conversions for performance reasons, but being 

In FIG. 2, a client system 208 comprises an application able to access data at any speed is better than not being able 
such as Microsoft Word which utilizes several system librar- to access it at all. If a single conversion between A and C is 
ies and components 212, and interfaces with an input/output available on only one of the client and server, only one 
(I/O) manager 214. The system libraries include OLE32 in 30 conversion need take place. It should be noted that a filter 
one embodiment, which comprises a set of well known driver on the server side may be all that is required, as it can 
interfaces for providing multiple streams and other internal provide format conversions on the client's view of informa- 
structure to a single information container stored by storage tion prior to providing it to the client, and also can appro- 
system or file system. The version of OLE32 in 212 is older, priately transform information provided by the client to the 
and uses a docfile format which stores information in a 35 proper format for storage on the server, 
single steam of the unit container provided by the file In further embodiments, the conversion modules provide 
system. A dynamic conversion filter driver 216 resides for conversion of data for different versions of a single 
between the I/O manager 214 and a file system client 218 on application, such as between documents stored in a Word 7.0 
an NTOS WindowsNT® kernel driver stack. The filter format and a Word 8.0 format. Software to perform such 
driver 216 provides conversion between different file or 40 conversions is well known in the art, and is highly dependent 
storage system formats to provide the application 210 and on the particular applications involved. Further conversion 
system libraries 212 with the ability to access information in modules can provide data from different applications, such 
the format that they know how to handle. Filter driver 216 as other word processors, spreadsheets, or imaging programs 
will recognize the desired format, and provide a dynamic which may have their own formats for storing data. One 
view of the information in that format. Even error codes are 45 example of such a format is the lag based format of 
converted in case the application relies on such codes from hypertext markup language (html). In this example, a word 
the file system that it expects is handling and storing the processor which is not tag based, may store a document in 
data. The filter driver resides in the kernel of the operating one format, and an html editor may request access to that 
system in one embodiment, and also has the ability to invoke document in html format. Upon receiving such a request to 
loadable conversion modules for providing conversion of 50 open the file containing the document, the filter driver may 
further application level and file system level formats. The invoke a conversion module to perform dynamic conversion 
loadable modules can be provided by the operation system. and provide an html view of the document to the html editor. 
They can also be provided by apphcations so that data Further conversion of the underlying storage format may 
created in a new format can be made available to earlier also be required. These conversions can be transparent to the 
versions of the application that expect a different format. 55 html editor such that it believes that it directly accessed an 

Application 210 can also access data through a network html file from the storage system. Upon completion of 

connection represented at 220. A server system 222 also viewing or editing the document, the reverse conversions are 

comprises a network connection 224 coupled to a remote performed, and the document is again stored in a non-html 

filesystem server 226 which is in turn interfaced to a server format in the original file system format, 

I/O manager 228. I/O manager 228 routes file interactions 60 FIG. 3 illustrates the conversion at a data structure level, 

through a server conversion filter driver 230, which also has An application such as Microsoft® Word indicated at 312 

the ability to invoke loadable conversion modules 232. The utilizes OLE32 interfaces 314 to access data it thinks is 

filter driver 230 interfaces with a native file system 234 stored in a Windows95® environment at 316. Block 316 

which stores data in a multi stream format on secondary indicates an expected file system such as those implemented 

storage 236. Secondary storage 236 comprises virtual or 65 in Windows95 where files are stored in a docfile format, and 

physical disks or other type of persistent storage. The native a network connection to a WindowsNT® 5.0 environment 

file system 234 is provided by Microsoft® WindowsNT® where files are stored in native structured storage (NSS) 
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format which is only exposed through OLE32/Stg applica- 
tion program interfaces (APIs), The version of OL£32 at 
314 expects to view the data it deals with as if it were stored 
in a single stream structure storage format (docfile) indicated 
at 318 consistent with Windows95. The single stream format 5 
comprises metadata 320, which includes items like file 
allocation tables (FAT) which identifies where segments of 
data, 322, 324 and 326 which are logically connected to 
comprise application data of file are located on disk. Meta- 
data 320 also may include application specific information 
such as document profiles and formatting information that 
OLE32 314 uses, but is not normally seen by the application 
312. 

Although OLE32 314 expects the WORD document to be 
represented as a single -stream docfile, the OLE32 on the 
WindowsNT 5.0 system has previously stored the document 
using the new Native Structured Storage (NSS) format. The 
data shown in 320, 322, 324, and 326 are logical views of 
the data actually stored in 332, 334, 336 and 338. 

A dynamic storage format conversion filter at 330 con- 
verts between the docfile format view expected by Windows 20 
95/0 LE32 and the multi stream structure NSS storage 
format used on WindowsNT 5.0 by default as represented in 
blocks 332, 334, 336, 338, 340 and 342. The native struc- 
tured storage is represented by a block 332 of synthesized 
metadata, which comprises auxiliary information about the 25 
file to aid in quickly converting it to the format desired by 
the application. Pointers and hints about the conversion are 
kept in streams represented by block 332. It can also include 
audit trails of file access, such as the identity and time of 
access to a file, and also a record of changes to allow 30 
reconstruction of various temporal versions of the file. 
Further information can include work flow semantics to 
ensure that proper approvals are obtained prior to changing 
a document by a member of a group, or to otherwise manage 
work flow. The application data, or user data is stored in 35 
multiple streams as indicated at blocks 334, 336 and 338, 
while native format specific metadata is stored at block 340 
and comprises a separate stream of associated attributes such 
as names of the files, and other well known information 
related to the NTFS file system. 40 

Hie on-the-fly conversion allows non-WindowsNT 5.0 
clients, such as older version applications, to read and write 
NSS files as if they were in docfile format, without severe 
performance penalty. It also allows NSS files to be concur- 
rency accessed according to 'how-to' rules that satisfy both 45 
older docfile and newer NSS format requirements. Files in 
NSS format are not degraded to docfile format, unless 
absolutely necessary, such as when DOCVIEW is corrupted, 
or is being copied to a non-NTFS 5 volume. 

The format of an information container includes more 50 
than the logical layout and semantics of the data There may 
be auxiliary information that has implied semantics (ACLs, 
rep arse points, property sets, auxiliary data streams, alter- 
nate representations, cached information, annotations, audit 
trails, workflow specifications, synthesized data). Some of 55 
the semantics can be far more complex than just what data 
to provide for a read operation. Different status codes may 
be returned depending on the history of operations by the 
current, as well as other conciurrent, applications and the 
underlying state of the system and network When informa- 60 
tion is returned, the information itself may be dependent on 
the histories in a formalized way. The filter driver must also 
maintain the format semantics relating to extraordinary 
events, such as system crashes. The details of all these 
semantic considerations are part of the file format, and are 65 
translated by the filter- driver when providing a view of 
information requiring dynamic information conversion. 
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When applications open files, or database records, or 
other persistent information containers, they specify the 
format version that they expect to see by means of a 
parameter to the API, a naming convention, or through a 
default expected- form at mle, though the file may actually be 
stored in a different format. If the file's tme format and the 
expected format are compatible, the filter driver allows 
access to the information. When the application reads, writes 
or otherwise accesses the file, the filter driver give the 
application a view of the file that appears to be in the 
expected format. 

Besides the format of the data, the filter-driver also 
translates semantic information regarding concurrent access 
between applications expecting various formats, as well as 
maintaining auxiliary information and metadata, used for 
managing information and for other purposes, such as crash 
recovery or performance tuning. 

When a file of a particular format-type is created, the filter 
driver picks a default format based on the format specified 
by the client and the target storage system where the file will 
reside. The filter driver not only provides a client with the 
expected format through dynamic conversion, it may also 
convert the actual format that a file is stored in. If an 
application opens a file in one format, but then changes the 
format itself to an unknown format, the final format may be 
used by the filter driver to store the modified file in the file 
system. 

Files moved between storage systems with different char- 
acteristics may need to be converted to different formats. 
Also, a filter driver may keep a file in different fonnats 
depending on its access history, opting for the most common 
access, or using a private internal format that isn't exposed 
to applications. 

In one embodiment of the invention, the filter driver is 
used to provide dynamic conversion between the multi- 
stream NSS format file used by the OLE32 component in 
WindowsNT 5.0 and the single-stream docfile used by 
OLE32 in earlier systems. The intemal structure of a docfile 
can be quite complex — due to the allocation/de-allocation of 
FAT sectors over time. NSS files leverage the implementa- 
tion of multi-stream files in NTFS to remove the allocation 
structures (FAT, DIF (double indirect FAT), and mini-FAT). 
When the filter driver synthesizes the docfile format from an 
NSS file when an open is requested at 410 in FIG. 4, the NSS 
file is retrieved at 412, and a cache is checked to see if 
allocation structures already exist. If not, appropriate docfile 
allocation structures are generated at 414. However they can 
be given a clean structure using contiguous allocations, 
making the conversion from multi-stream format relatively 
easy to perform on-the-fly. The synthesis requires additional 
cycles on the server, but the synthesized data is cached at 
416 between opens of docfile views. The stream that con- 
tains the synthesized docfile metadata is referred to as the 
conversion stream. The conversion stream, together with the 
NSS large streams and the ministreara, which reside in 
native NTFS streams, comprise the docfile view of the NSS 
file which is provided to the application at 417, 

Dynamic conversion maps docfile view read/write 
requests into accesses on the underlying NTFS streams 417. 
The key differences between the docfile and NSS formats are 
that the docfile FAT and DIF are missing from the NSS 
format. The allocation metadata for each large stream is 
maintained internally by NTFS. The header and the direc- 
tory stream have different representations. The NSS minis- 
tream uses contiguous allocation of small streams, eliminat- 
ing the need for the docfile mini-FAT. 
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When a docfile view is extended, additional storage is In NTS, rep arse points are used to mark NSS files so that 

allocated at the end of the conversion stream. If a docfile NSS doesn't have to read the header of every file opened on 

view is modified, the changes show up in the underlying the volume. The directory stream is marked sparse so its 

streams. When the final docfile view is closed, the filter length can be set to the length of the corresponding docfile 
driver reconverts the internal conversion format that sup- 5 view without using twice the disk allocation. Both of these 

ports the docfile view into NSS format at 418 and ends at designs are regarded as implementation details particular to 
420. Reconversion is performed carefully so that it can be 

re<»vcred by the filter driver the next time the file is opened, ^^^^ embodiment, the lUter driver is implemented as 

.f the system crashes while reconversion is bemg performed. ^ ^^.^^^ ^^.^ ^ ^ 

After a docfi e view is opened until reconversion is (convert-NSS). Hie following changes to NTFS and NTI/0 

complete, the native NSS view of the storage format of the > ^ r^vrcc/Mcc „<.T., m. 

file is unavailable. If a concurrent NSSVIEW open is '° T^"" ^^ir^tfFn^^X rr^^ 

attempted, it fails-and OLE32/STG will reopen for a i^pi??T^BF^^Tnd^^^f^^^ 

docfile view. If there is already an NSS view open when a f ^ "^V'^°^\t;!J'^"?J"'^^^T 
docfile view is opened, the filter driver foUows the NSS/ , been added to the NTCreate/NtOpeoFile APIs. NWS files 

docfile commit protocol to obtain a consistent view of the ^^n^e marked as teing reparse pomts^An open to a file with 

NSS file, nie filter driver then performs a ftiU conversion by ^ '^P^^^ T STATUS.REPARSE and returns 

copying all the NSS streams into the conversion stream after f 32fc tag which the filter driver uses to idenUfy NSS files^ 

the synlhesized docfile metadata. Every subsequent docfile Individual streams of an NSS file can be marked SPARSE 

_f A 11 «ii .ri^..r^ and extended to an arbitrary size without taking up disk 

view open performs another full conversion until all views ^ ^ . . . 

of the file have been closed. Each full conversion will ^P^^' ^N^S uses this feature to make the size of the 

overwrite the previous conversion stream. The commit pro- ^^^^^^ data stream of an NSS view appear to be the same 

tocol keeps existing docfile view opens from trying to take ^ ^^le docfi e view si2^. Support is also provided for 

a snapshot of the docfile view while a new docfile view open associating file system filter dnver context with open files, 

does full conversion ^^^^^ operating systems may require different modifications 
1 11 xTcc • f*t, ^ as will be apparent during normal implementation of the 

Local opens of NSS files normally use an NSS view of the , ffc - r 

^, . -t . 1 • J CI • T- *!. concepts of the invention. 

file, while network opens require a docfile view. For the , il ^ , . , , . ^ , • ^ 

most pan, the operations on an NSS view are just passed ^ FIG. 5. a process in .he filter dnver for changing the 

down the driver stack OperaUons on a docfile view are f"™"' '^^'^^ depicted. At 510, file 
mapped into operations on the underlying streams using the -„ '^'i^^'^ monitored, and staustics are generated at 520 

conversion map, which specifies which parts of the docfile regarding the various forrnats the file is requested in^lf the 

. ,u • r.^A ..ru^^u access pattern reaches a threshold 525, the format that the 

view live in the conversion stream and which parts live in . ^ , . , • « . , ^ . ^-rc 

the native NSS streams (i.e. the ministream and the large <lf "^^V dynamically be ;dtered to be a differem 

streams). This mode of operation is called partial conver- »0 for reasons of tunmg efficiency. If the file is 

/ r X already stored in the most efficient format as indicated at 

, ^Toc^ ' J J - *u 540, monitoring continues at 510. The file may be stored in 

When there are both NSS views and docfile views on the ' * ^ . *u * p * ■ a * ■ a * u 

. , . i- ^ J T 1 a different format if that format is determined to be more 

same file, interoperation is complicated. In the general case, re-. c • 4- 

. ' ^ - -.1. , r • A f efficient, or easier to convert from for the various formats 

commits can occur from either type of view. A commit from , fii^ ,v ..™.oori rrffl^^^«r>w ;o 

. . ^ J-'/ J CI • • 1 u that the file is most commonly accessed in, Emciency is 

an NSS view can be propagated to a docfile view simply by ,^ • j. . j ci ^ • i ^ 

. f F & - * o * determined by augmenting the stored file format to mclude 

regenerating the conversion map and conversion stream. But An , •^ .- u.^uf . j» * 

V r J «i • • u • TT, history information about the formats and types of accesses 

a commit from a docfile view IS much more expensive. The t^- • ^ • j- : *t, . <• 

r^A^rr^TT- " i-at- j .u j CI * J. u . u of thc filc. This informaUou IS uscd to prcdict thc amouut of 

FAT/DIF, mini-FAT and other docfile metadata have to be . . • ^ u ^ • ■ e a a * 

1 J • *• f A u 4U overhead required by dynamic conversion from candidate 

analyzed. This is the same operation performed when the ^ i- .r^ uj ij u c A-(r 

, ,i ^, . . , J/ • \u*-i storage formats. The overhead includes a number of differ- 

last docfile view is closed (reconversion), but is too expen- . ^ . ■ i j- i j 

- , c J ^-1 ■ ent factors including cpu cycles, memory usage, and access 

sive to perform at every commit of a docfile view. d< i . rr ,i ^ a e • ci * j * 

^ . , , , , rr -r vTon lateucy. If the overhead tor accessing a file is predicted to 

To avoid the overhead of frequent reconversion if an NSS significantly lower (10-20%) if the storage format is 

view exists, the filter dnver creates a complete copy of the ^^^^^^^^^ ^^^^^^ ^^^^^^^ converted Static 

docfile view in the conversion stream rather than just the ^„^ersion to a new format is performed during periods 

docfile metadata 332 TTiis mode is cafied full conversion. ^^^^ ^ ^^^^^^^^ underutilized, such as eve- 

Al though the binary data in the format is fully converted in 50 weekends 
this case, the ' how-to' semantics continue to be implemented 

by the filter driver. Achieving the desired behavior requires Conclusion 

the cooperation of the OLE/NSS implementation. If a doc- a conversion component referred to as a filter driver 

file view is already open, the filter driver fails any NSS view provides a view of information stored in one format through 
opens and OLE32 in WindowsNT 5.0 retries requesting a 55 dynamic conversion to a requested format The formats 

docfile view. comprise application specific formats, as well as particular 

The full conversion docfile view is updated every time a persistent store formats used by components of the system 

new docfile view open occurs, if needed. The full conversion such as a file system or a database. While the embodiments 

is also updated at the end of every NSS view commit. described relate to the conversion between NSS and docfile 

The NSS format introduces a new transaction implemen- 60 formats, it is recognized that many other conversions can be 

tation for structured storage. Changes are recorded in a performed, including dynamic conversions of application 

transaction log in a scratch stream. A concurrent docfile view specific formats such as those required by different versions 

open has to apply the changes specified in the log in order of a single or multiple applications. The invention may be 

to see the correct NSS file. If the system or applicafion used to simplify the upgrade of information formats, appli- 
crashed, then there will not be an NSS view when the log is 65 cation versions, and operating systems. These conversions 

processed by the filter driver, and the scratch stream will be can be provided by the conversion component of other 

deleted (and a ScratchID in an NSS header reset to invalid). dynamically loadable conversion modules, which allow 
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fonnat conversions to be provided by both operating sys- stored in the second format can be accessed as though 

terns and applications. The conversion modules run in the they were stored in the first format, 

kernel below the I/O manager. Both networked and local 7. The method of claim 6 wherein the filter driver resides 

accesses are routed through the conversion component and in the operating system and wherein the act of invoking 
conversion is performed if needed, providing a general 5 further comprLses loading the conversion module from an 

solution to the problem of format incompatibilities between *^P^^?^"S systena. . . . 

different applications and operating systems and/or versions. / J^^ ^.^^hod of claim 6 further comprismg changmg one 

Cached data aids in the conversion in one embodiment. A the portions of the file converted to the first format and 

reparse point is used to indicate that a file should be ."^''J^^f^^^ "^'^^"^ P"^"^'"*" ''^ mformaUon back to the 
converted in one version of the invention, but other types of lO '"^p"! ^ mT^od of claim 6 wherein the loadable conversion 

mdicaUon, such as flags or table based mechamsms may also ^^^^^^ ^^j^^^^ ^ ^^^^ ^^^^ ^^^^ ^^^^^^ 

be used. This application is intended to cover any adapta- format 

tions or variations of the present invention. It is manifcsUy ^q. The method of claim 6 further comprising creating an 

intended that this invention be limited only by the claims and intermediate format to encapsulate a format and semantic 
equivalents thereof. 15 difference between the first and second formats. 

We claim: 11. The method of claim 10 wherein the intermediate 

1. In a computer system having an application program, a format includes persisted cached information for use during 
data storage system, and a filter driver residing outside of the a subsequent format change. 

application program and operable for converting file 12. The method of claim 6 wherein one of the first and 
formats, a method of providing the application program 20 second formats is a single stream structured storage format 

access in a first format portions of files stored in the data and the other format is a multi stream structured storage, 

storage system in a second format, the method comprising: 13. The method of claim 6, further comprising concur- 

intercepting at the filter driver a request sent by the rently providing a view of the requested portions the file in 

program to access in the first format portions of a file the second format. 

stored in the data storage system in the second format, 25 14. The method of claim 13 wherein the method is 

the requested portions of the file comprising less than performed by instructions stored on a computer readable 

the entire file; medium. 

. . . J f.u £1 15. The method of claim 14 wherem the mstructions are 

retrieving the requested portions of the file; r • i r 

. ^. . , . . J , , , part of a kernel of an operating system on a computer, 

invokmg m the filter driver a conversion module selected ^^^^^^ ^^^^^^ ^ ^^^^^.^ ^^^^^ ^^^^^ ^^^-^^^ 

from a plurality of conversion modules to convert the operating system, and the act of invoking further 

requested portions of the file from the second fonnat to comprises loading the conversion module from the program. 

the first format; and method of claim 6 wherein one of the first and 

providing the appHcation program access to the requested second formats is a format according to one operating 

portions of the file in the first format. 3^ system and the other is a format according a different 

2. The method of claim 1 wherein one of the first and operating system. 

second formats is a single stream structured storage format -phe method of claim 6 wherein one of the first and 

and the other format is a multi stream structured storage. second formats is a format according to one version of the 

3. The method of claim 1 further comprising changing one program and the other is a format according to a different 
of the portions of the file converted to the first format and ^ version of the program. 

converting the changed portion of information back to the ^9 method of claim 6 wherein one of the first and 

second format. second formats is a format according to one program and the 

4. The method of claim 1 further comprising creating an ^^^^^ ^ formal according to a different program, 
intermediate format to encapsulate a format and semantic 20. , The method of claim 19 wherein one of the two 
difference between the first and second formats. programs is tag-based and the other is not tag-based. 

5. The method of claim 4 wherein the intermediate format 2I. The method of claim 19 wherein one of the two 
includes persisted cached information for use during a programs is a data processing program adhering to a data 
subsequent format change. processing protocol and the other program is another data 

6. In a standalone or networked computer system having processing program adhering to a different data processing 
at least a data storage system, a program for accessing files protocol. 

stored in the data storage system, and a filter driver residing 22. The method of claim 21 wherein the data processing 

outside the program for converting file formats, a method of protocol is an image processing protocol, 

accessing in a desired format portions of files stored in the 23. The method of claim 21 wherein the data processing 

data storage system in formats different than desired by the protocol is a sound processing protocol, 

program, the method comprising: 24. In a standalone or networked computer system having 

intercepting at the filter driver a request sent by the at least a data storage system, a program for accessing in a 
program to access in a first format portions of a file . desired format a file stored in the data storage system, and 

stored in a second format in the data storage system, the a filter driver that resides outside the program and converts 

portions of the file comprising less than the entire file; the format of the file to the format desired by the program, 

retrieving the requested portions of the file stored in the a machine readable medium having instructions stored 

second format; thereon for causing a computer to perform a method of 

invoking in the filter driver a loadable conversion module accessing in the desired format the file in a different format 

selected from a plurality of conversion modules to than desired, the method comprising: 

convert the portions of the file stored in the second receiving at the filter driver a request sent by the program 

format to the first format; and 65 to access in a first format portions of the file stored in 

providing a view of the requested portions of the file from a second format in the data storage system, the portions 

the conversion module such that the portions of the file of the file comprising less than the entire file; 
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retrieving the portions of the file stored in the second 
format; 

invoking in the filter driver a loadable conversion module 
selected from a plurality of conversion modules to 
convert requested portions of the file to the first format; ^ 
and 

providing a view of the requested portions of the file from 
the conversion module such that the portions of the file 
stored in the second format can be accessed as though 
they were stored in the first format. 

25. The medium of claim 24 further comprising instruc- 
tions for causing the filter driver to invoke the loadable 
conversion module. 

26. The medium of claim 24 further comprising instruc- 
tions for causing the filter driver to maintain semantics of the 
second format, 15 

27. The medium of claim 24 wherein one of the first and 
second formats is a single stream structured format and the 
other format is a multi stream structured format. 

28. In a standalone or networked computer system having 

at least a data storage system, a program for accessing in a 20 
desired format data stored in a binary data files in the data 
storage system, and a filter driver that resides outside the 
program for converting the format of the data stored in the 
binary data file to the format desired by the program, a 
method of accessing in the desired format portions of a ^ 
binary data file stored in a different format than desired in 
comprising: 
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receiving at the filter driver a request sent by the program 
to access in a first format portions of the binary data file 
stored in a second format, the portions of the binary 
data file comprising less than the entire binary data file; 

retrieving the requested portions of the binary data file 
stored in the second format; 

invoking in the filter driver a loadable conversion module 
selected from a pluraUty of conversion modules to 
convert the requested portions of the binary data file to 
the first format, the loadable conversion module being 
selected based on the first format and the second 
format; 

providing a view of the requested portions of the binary 
data file from the conversion module- such that the 
portions of the binary data file stored in the second 
format can be accessed as though they were stored in 
the first format; and 

changing one of the portions of the binary data file 
converted to the first format and converting the 
changed portion of information back to the second 
format. 

♦ ♦ ♦ * * 
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